Nabil Salehiyan
Multivariate Analysis Cookbook
The University of Texas at Dallas
Dr. Herve Abdi
Nabil Salehiyan
Nabil Salehiyan
Principal Component Analysis
Principal Component Analysis is a method of data analysis that eliminates noise, finds components (orthogonal to
each other) that explain the variables, shows similarity between variables (through angles), shows similarity between
observations (through distance), and compresses information to show only what matters. The goal is to see which
components are the main contributors to the phenomenon’s we observe. For example, what aspect of cell phones
(such as camera megapixels) predict things such as the price range they fall in.
Data
The data I was given in my analysis was of mobile phones. The variables were grouped by price range (low, med,
high, vhigh). These variables are: sc_h: screen height
sc_w: screen width
fc: front camera megapixels
pc: primary camera megapixels
px_height: pixel resolution height
px_width: pixel resolution width
m_dep: mobile depth in centimeters
int_memory: internal memory in GB
Nabil Salehiyan
Methods
In the scree plot we see that there are 5 or 6 components that could be used to describe our data. For simplicity I will
look at the highest 2 dimensions.
In this barplot we see that there are 4 variables that make up the first dimension. Sc_w, sc_h, pc, and fc.
Nabil Salehiyan
For dimension 2, we see that the main contributions are made by px_width and px_height.
In a bootstrap analysis I saw that some of the gray variables are also contributors to these dimensions which shows
that in the raw data- they did not contribute due to chance. If this analysis was done with an infinite population and
with replacement, these other variables would could be trusted to replicate in dimensions 1 and 2. The only two that
did not contribute (in raw OR bootstrap) was n_cores and mobile_wt.
Here in the variables loading as inertia graph we can confirm the variables that make up the first two dimensions.
Sc_w/sc_h & pc/fc make up dimension 1 and px_width/px_height make up dimension 2. We also see that n_cores
and int_memory is in the middle- they also did not show up as significant contributors in the bar graphs. To
investigate whether these variables are true averages or fall in another dimension, we must look towards the circle of
correlation.
Nabil Salehiyan
In this circle of correlation, we see that n_cores and int_memory are still in the center and nowhere near the edge of
the circle which suggests that they lie in a different dimension. Also, if we look at the variable loadings as
correlation data that was provided, we see that the highest loadings for int_memory is in the fourth dimension and
the highest for n_cores lies in the fifth dimension.
Furthermore, we can infer some correlations between the variables by looking at the arrows. We see that fc and sc_h
are almost orthogonal to each other which should mean they do not relate to one another. We also see that pc and
sc_w are going in almost opposite directions, which suggests they have a large negative correlation. Lastly, we see
some positive correlations between sc_w and sc_w, fc and pc, as well as pc_width and px_height- because their
arrows are going in the same direction. We must keep in mind that because of the distance these variables have from
edge of the circle, we cannot reliably approximate their correlation. To confirm these inferences, I looked at the
correlation matrix heat map.
Nabil Salehiyan
The heatmap confirms all the speculations I made. Pc & fc have a positive correlation of 62, px_height and
px_width have a positive correlation of 48, and sc_h and sc_w have a positive correlation of 48. We also see a
negative correlation of -19 between pc and sc_w and we see no correlation between fc and screen height (0).
We can now move onto the observation groups which were price ranges of high, med, low, and vhigh.
We see here that there seems to be an inverse relationship between the colored dots. I assumed that there was a
negative correlation between low and high price, as well as vhigh and medium price. Also, when we look at the
variables projected onto the observation groups, it seems like things such as fc and pc relate to low price
(interesting). Also, screen size falls in the vhigh price category, and pixel resolution falls between high and medium
prices. To investigate the reliability of this graph we must look towards the tolerance intervals and confidence
interval graph.
Nabil Salehiyan
Here we see an overlap between the tolerance intervals. This shows that we cannot reliably assign our observations
to groups. Meaning we can’t say for certain that fc and pc predict low priced phones or that screen size predicts high
prices.
Lastly, we do not see an overlap between the confidence intervals. This suggests that the averages of these groups do
in fact differ. So, although we have low accuracy in group assignment, we can reliably differ the average prices of
these groups from one another.
Summary
Through this data visualization we can infer that we cannot reliably assign cell phones with the groups of price
ranges for dimension 1 & 2. Some features such as screen size can be attributed to higher prices but in general, they
vary too much to say which variable predicts price. Lastly, group means are reliably different for dimension 1 & 2.
Nabil Salehiyan
Correspondence Analysis
Correspondence analysis is a method of multivariate data analysis that analyzes categorical data. This data is stored
in a contingency table. CA transforms the data into rows and columns and the factor scores for them respectively.
These scores can be visualized in the same map since they share the same variance. After this has been done, we
assign a mass to the rows and a weight to the columns. The greater the mass, the higher the importance- and the
same for the weights. It is best to use CA when your data includes at least 2 rows and 2 columns and are searching
for similarity between your data, and the strengths of the similarity.
Data
In my data set, I had different brands of sausages and different sensory feelings that tasters rated after eating the
sausages. The different sausages are: Duby, Chimex, Capistrano, Bafar, and Alpino. The different sensory reports
range from well-being, relaxed, melancholic, to happy, salivating, and more.
The scree plot showed that we can look at up to 4 dimensions but for simplicity, I will focus on 2 dimensions.
Methods
Nabil Salehiyan
The contribution barplot for the rows (sausages), for dimension 1 are shown above. It is observed that Duby and
Bafar are the significant contributors for this dimension. When we look at the bootstrapped row component for
dimension 1- the contributions remain the same. This means that this experiment can be reliably replicated and
Duby/Bafar will remain the significant contributors for the first dimension.
For dimension 2 (rows)- we see that Chimex, Capistrano, and Bafar make up the significant contributions. When
boostrapped, it is observed that Bafar is the only significant contributor for the second dimension. This suggests that
when this experiment is replicated with an infinite population, Bafar will be the main contributor
for this dimension.
Nabil Salehiyan
The graph for the factor scores confirms that for the rows- Chimex and Capistrano make up the 2
nd
dimension and
Duby/Bafar make up the first dimension. Alpino is not a significant contributor- we must look at the cosine circle to
see if this is an average sausage or belongs in another dimension.
Looking at the cosine circle, we see that most of the sausages are near the edge of the circle- which tell us that they
are well explained by 2 dimensions. Also, we see that Alpino is near the center which tells me that it belongs in
another dimension.
Nabil Salehiyan
For the columns (sensory feelings), we see that the positive side of dimension 1 is made up of Relaxed, Melancholic,
Soothed, and Salivating. The negative side of dimension 1 is made of Guilty, Well.being, Impressed, and Joy. These
findings are reflected in the factor score map beside it. When a bootstrap analysis is ran, Melancholic remains the
only significant contributor of dimension 1- suggesting that the other significant contributors likely depended upon
the sample size.
For dimension 2 (columns), the positive side is made up of Well.being, Romantic, and Melancholic. The negative
side is made of Famished and Salivating. These are reflected in the column map beside it. When a boostrap analysis
is ran on the data, Romantic and Famished remain the only significant contributors- suggesting that the significance
of the raw data contributors was probably due to sample size.
Nabil Salehiyan
When looking at the factor scores for the column (feelings0, we see a lot of points that seem to be in the center. The
points seen in the contribution bar plot are reflected in this graph. For example, romantic and maleanchoic are in the
positive end of dimension 2. But, I want to see if the other points are true averages or if they belong in a different
dimension. For that, we move onto the cosine circle.
In the cosine circle we see that most of the sensory feelings are close to or touching the edge of the circle- which
shows they are well explained by these two dimensions. Thirsty, Sad, and Refreshed are the only ones that are not
near the edge, which tells me they are better explained by another dimension.
Nabil Salehiyan
The final thing I will look at from the analysis is the rows and columns cosine circle and relate it to the chi squared
residuals graph. It is obvious that these two graphs say the same thing. For example, the chi squared graph tells us
that Bafar is going to be close to the Famished point (big blue circle) and far from the Romantic point (big red
circle). When we compare this to the cosine circle we see that this indeed is true. This is the same for all rows and
columns.
Summary
To conclude, it is observed that two dimensions well explains the relationship between sausages and the sensory
feelings reported by tasters. The cosine circle tells us that Chimex and Capistrano are very similar to each other as
far as the columns go, judging by their distance. Alpino and Bafar are also close to eachother but we cannot rely on
that due to the distance Alpino has from the edge of the circle. Duby is the most different sausage in relation to the
other sausages. As far as taste, Duby gives the sense of Guilty, Depresssed, and joy. Bafar gives the sense of
Salivating, Energetic, Soothed, and Famished. Chimex and Capistrano give the feeling of Romantic, Well.being, and
Melancholic.
Multiple Correspondence Analysis
Multiple correspondence analysis is a method for analyzing multiple qualitative variables. If our dataset has
quantitative variables, we must translate them to qualitative variables. Much is like CA and PCA such as how we
look at the chi-squared heat map. The difference is since they are both nominal variables, we must interpret columns
that represent different levels of one variable. We can do this by disjunctively coding our quantitative variables into
1’s and 0’s. One variable will now represent a set of columns. The variables are combined into new components and
the amount of each variable in the component is the loadings.
Nabil Salehiyan
Data
In the MHL study, researchers wanted to investigate what factors impact mental health literacy in the college-aged
population. The variables were race (“RaceCat”), age (“AgeCat”), whether a student has taken a clinical course
(“ClinCourse”), major (“MajorCat”), and the students experience with mental health. The supplementary variables
are gender. Our observations will be the MHL score (“low, med, high”).
Methods
To begin, we look at the MCA scree plot and decide how many dimensions we want to observe. For simplicity, I’m
focusing on the first two dimensions. Although if we wanted to get more out of this study, I think we could look at
up to four dimensions.
Nabil Salehiyan
When looking at the factor score map, we can extract out the two most important contributions based on how far they
have passed the dimensions threshold. We see that the Major makes up the positive end of dimension 1 and 2, and
ClinCourse makes up the positive end of the first dimension.
Here in the bar plot on the left, we can confirm that dimension 1 is made up of Major and ClinCourse. For
dimension 2, the bar plot tells us that Major and Race are the primary contributors. This shows that the threshold for
the factor score map could include race.
Nabil Salehiyan
When a pseudo bootstrap analysis is run on the first two dimensions for these variables, we can see that if this
experiment was replicated with an infinite population, all variables except for Experience would be relevant for the
first dimension. Major, Race, and Age would be significant contributors for the second dimension.
The Chi Squared heat map are coefficients of correlation. The highest correlation between our variables is between
ClinCourse and Major (44), which suggests that a students’ major is positively correlated whether they will take a
clinical course. This map also suggests there’s a very low positive correlation (4) between a students’ level of
experience and whether they have taken a clinical course (“Experience”, “ClinCourse”).
The factor scores with confidence intervals for our observations show that the mean value for our variables are in
reliably separate groups (“low”, “med”, “high”), but when compared to the overlapping tolerance interval- we can’t
confidently assign a variable to a price range.
Nabil Salehiyan
The next thing we want to look at are the factor scores for our important variables. We already saw above that
“MajorCat” and “ClinCourse” are significant contributors for our dimensions. Here we can dissect more and see which
levels of these contribute to which dimension. It seems like the negative end of the first dimension is made up of
“MajorCat.STEM”, “MajorCat.Hu/So” (Humanities and Social Sciences), “MajorCat.Econ” and
“ClinCourse.NoClin”. The positive end of the first dimension: “MajorCat.Psyc”, “MajorCat.Edu”, and
“ClinCourse.Clin”, and “MajorCat.ApMed”. The second dimension is made up of “MajorCat.Econ”,
“MajorCat.ApMed” (applied health science), and “ClinCourse.NoClin” in the negatives and MajorCat.Educ”,
“ClinCourse.Clin”, MajorCat.Psych”, MajorCat.Hu/So”, and “MajorCat.Stem” in the positives. This tells us which
level of majors makes up which dimension. We might be able to use this to find out which level MHL score they make
up. Its also obvious that some variables were positive in one dimension but negative on the other. To see which variable
is best represented in which dimension we look towards the bar graphs.
Here we can see how well each dimension tells us information about our variable. For example, the “ClinCourse”
variables along with a few “MajorCat”, are best represented in the first dimension. In the second dimension, “AgeCat”
and “RaceCat.NaAm” and “RaceCat.Wh/Ca” are significant contributors for the second dimension. It is clearer here
the relationship between the variables which dimension was not clear. For example, we can hypothesize that having a
clinical course is going to be related to achieving a higher MHL score.
Nabil Salehiyan
If we run a bootstrap analysis, dimension 2 is the only one that differs from the raw data. Many new variables become
significant contributors if we want to replicate this study.
When looking at the factor score maps for all the variables, it’s very busy and we also can’t tell which variables are
true averages or not. To decide, we must compare it to the cosine circle. The true averages are as follows:
“AgeCat18to22”, “RaceCarBl/Af/Am”, “ClinCourse.NoClin”, and possibly “MajorHu/So”. The only variables that
are well explained by two dimensions are “RaceCat.Multi”, “ClinCourse.Clin”, “MajorCat.Psyc”, “RaceCat/Wh/Ca”,
“ClinCourse.NoClin”, MajorCat.STEM”, MajorCatHu/So”, and “AgeCat.18to22”. The rest of the variables are
better off to be examined in their respective dimensions.
Nabil Salehiyan
For our supplementary variables, Male” makes up the negative end of dimension one and the positive end of
dimension two. The opposite is true for “Female”. We might be able to assume which gender scores higher on the
MHL test.
Summary
Through this MCA analysis we can conclude that a high MHL score depends on multiple qualitative variables.
Our supplementary variable plot suggests that females tend to score higher MHL scores compared to males but with
a very low eigenvalue, we can’t confidently assume this.
What we can assume is that taking a clinical course is correlated with having high MHL scores. Also, major also
impacts a student’s scores with STEM, humanities/social science, economy, and applied health science being
associated to low MHL scores opposed to psychology and education majors being associated with higher MHL scores.
As far as demographics, it can be assumed that those students who are female, white, and above the age of 28 score
higher on the MHL test than males who are black/African American, Hispanic/Latino, Asian, and Native Americans,
all between the ages of 18 to 22 who seem to generally score lower on the MHL test. Due to the overlapping intervals,
we cannot randomly assign an MHL score to any single major, although their averages do differ. Although we do see
some high correlation in this analysis, the studies design, logistical issues, and low eigenvalues, tell us that we
shouldn’t assume any definite causal relationships.
Nabil Salehiyan
Discriminant Correspondence Analysis
Discriminant correspondence analysis is a method for analyzing qualitative, categorical, or nominal data. The same
steps for CA and MCA apply to this (such as disjunctive coding), where we plot our data to see the distributional
equivalence. After we get our barycenter data, we plot our observations and are able to see the relationship between
the variables. This method is exactly like BADA except we are not using quantitative data.
Data
The same MHL data will be used for this analysis, where mental health literacy was tested. The variables were race
(“RaceCat”), age (“AgeCat”), whether a student has taken a clinical course (“ClinCourse”), major (“MajorCat”), and
the students experience with mental health. The supplementary variables are gender. Our observations will be the
MHL score (“low, med, high”).
Methods
The eigenvalue scree plot only has two dimensions, these are going to be the ones that we look at.
Nabil Salehiyan
For the groups MHL scores, we see a difference here when compared to MCA on dimension 2. Dimension 1 still
separates the low and high scores but now we see a slight separation between low/high and medium scores on the
second dimension. The confidence intervals also show us that the mean scores are reliably different from each other.
When looking at the tolerance intervals, we cannot reliably assign a score to a group due to the overlapping hulls.
In our factor score map, it is hard to determine which variables are significant on the first dimension without looking
towards the bar plot. It is safe to assume that ClinCourse.Clin, RaceCat.Wh/Ca, MajorCat.Psyc, and AgeCat.28Plus
is on the positive end and ClinCourse.NoClin, MajorCat.STEM, MajorCat.Econ, and RaceCat.His/La are on the
negative end. These scores show which variables tend to correlate to higher MHL scores, respectively.
Nabil Salehiyan
In our contribution bar plot we see that the first dimension seperates ClinCourse.Clin, MajorCat.Psyc, and
RaceCat.Wh/Ca on the positive end and ClinCourse.NoClin, MajorCat.Econ, and MajorCat.STEM on the negative
end. This confirms my assumptions from the factor score map. When a bootstrap analysis is run on this dimension,
almost every level of the variables become significant contributors- which tells us what we need to consider if we
want to replicate this experiment with an infinite population.
Nabil Salehiyan
The second dimension seperates MajorCat.Educ, and MajorCat.Psyc on the positive end from MajorCat.Econ on the
negative end, giving us more detail than our factor score map as to which contributions are important. A bootstrap
analysis for this dimension leaves only MajorCat.Psych, and MajorCat.Educ as significant contributors, telling us
that these are the only two to consider on this dimension for replication.
Nabil Salehiyan
To see which levels of these variables contribute more to our analysis, we can look at this variable contribution map.
It is shown that the first dimension consists of AgeCat, Experience, and ClinCourse while the second dimension
consists of MajorCat, ClinCourse, AgeCat, and Experience.
For the confusion matrices, there is a slightly higher accuracy for the fixed results compared to the LOO results.
Here we see when scores are predicted correctly vs incorrectly. Theres not much of a difference between these two
methods but overall, a low prediction accuracy for DiCA. This leads me to think either more data is needed, or a
different analysis method would yield higher accuracy scores.
Summary
We can conclude through this analysis that those who major in psychology, education, have taken a clinical course,
and are white/Caucasian, and are above the age of 28 are more likely to have higher mental health literacy. Those
who have lower mental health literacy scores tend to have majors in economy, STEM, humanities/sociology, and
have not taken a clinical course. The accuracy scores tell us that we cannot reliably make these predictions are
further analysis, or data is needed for reliability.
Citation
Miles, Rona, et al. “Mental Health Literacy in a Diverse Sample of Undergraduate Students: Demographic,
Psychological, and Academic Correlates.” BMC Public Health, vol. 20, no. 1, 2020,
https://doi.org/10.1186/s12889-020-09696-0.
Nabil Salehiyan
Partial Least Squares Correlation
Partial Least Square Correlation is an analytical technique in which we normalize our data by rows instead of by
columns. In our analysis, we work with pancakes which means we have many variables and few observations all of
which are multicollinear. The goal of PLSC is to find the components that maximize the covariance between the
latent variables which are computed on quantitative matrices. This relates two tables to one another in order to find
commonality between. We The components are comprised of saliences/loadings and latent variables/factors. We
have pairs of latent variables from the data matrixes
Data
This data is comprised of 777 colleges and universities and many variables. The variables are grouped into “Private”
and “Public” schooling. The X-set of latent variables are related to costs while the Y-set is related more with
performance.
Apps: Number of applications received
Accept: Number of applications accepted
Enroll: Number of new students enrolled
Top10perc: Pct. new students from top 10% of H.S. class
Top25perc: Pct. new students from top 25% of H.S. class
F.Undergrad: Number of fulltime undergraduates
P.Undergrad: Number of parttime undergraduates
Outstate: Out-of-state tuition
Room.Board: Room and board costs
Books: Estimated book costs
Personal: Estimated personal spending
PhD: Pct. of faculty with Ph.D.'s
Terminal: Pct. of faculty with terminal degree
S.F.Ratio: Student/faculty ratio
perc.alumni: Pct. alumni who donate
Expend: Instructional expenditure per student
Grad.Rate: Graduation rate
Nabil Salehiyan
Methods
In the eigenvalue scree plot, judging by the kaiser line, we see one component as having almost all the variance. For
simplicity, I am going to look at the first two components which seems to explain about 98% of the variance.
Nabil Salehiyan
The important contributions for the Y latent variable dimension is made up of Grad.Rate, Terminal, PhD,
Top25perc, and Top10perc on the negative end and S.F. Ratio on the positive end.
A bootstrap analysis of the Y-set adds on P.undergrad, F.undergrad, Enroll, and Apps. Suggesting that these are
important variables to consider when replicating this experiment.
Nabil Salehiyan
The Important contributions for the X latent variable dimension seems to be only made up of Outstate, perc.alumni,
and Expend all of which are on the negative end.
A bootstrap ratio of the X-set shows adds Room.Board on the negative end and Personal on the positive end,
suggesting that these are important variables to consider when replicating this experiment with an infinite
population.
Nabil Salehiyan
This latent variable map shows us that Private and Public are separated by the X-Latent variable set. It can also be
observed that the means and confidence intervals of Private and Public are significantly different. This suggests that
these two groupings of college students can reliably be separated from one another in terms of their means and
group assignment.
The correlation heat map can confirm many of the differences between our variables that we might expect. We can
see groupings of high positive correlations between variables such as Expend:Top10perc, Expend:Top25perc,
perc.alumni:Top10perc, perc.alumni:Top25perc and so on. Also the highest negative correlations can be observed in
the S.F.Ratio column and the Grad.Rate:Personal correlation.
Nabil Salehiyan
Summary
The visualization of this data tells us a lot about colleges and their groupings. The highest predictor of being in the
top 10 or 25% of students seems to be associated with instructional expenditure per student, room and board costs,
and out-of-state tuition. These students also seem to be the ones who donate the most as alumni. The greatest
predictor of a PhD student seems to also be related to most of these variables (Outstate, Room.Board, perc.alumni,
Expend). The same goes for those achieving a terminal degree, and those who have a high graduation rate, although
graduation rate seems to go down with a higher personal spending rate. Lastly, there is a high negative correlation
between the student/faculty ratio and instructional expenditure, percent the alumni donated, room and board costs,
and out-of-state tuition. This suggests a negative relationship when there are more students per faculty member.
As for the groupings of these colleges (public/private) there is a reliable difference. The latent variable map shows
that these variables are clearly separated between private and public, both in means and confidence intervals. This
suggests that there truly is a difference between students when they attend a private school compared to a public
school.
DiSTATIS
DiSTATIS is an analytical method for analyzing multiple tables of data (3 or more). In this method,
variables are whole data tables. We must find the best linear combinations of these data tables in
order to create new tables from the old ones. In DiSTATIS, the latent variables are called partial
projections- partial because if we combine them all, we get the whole latent variable. We have one
latent variable per original data table. In order to run DiSTATIS, we must derive the data through
multiple factor analysis and multidimensional scaling to gather our distance matrices. These distances
tables are then converted into pseudo-covariance tables using double centering.
Data
The data I was given is from a sorting task in which participants sorted Mexican beers (rows).
Participants are grouped by gender (M/W).
Participants: C1-C51
Nabil Salehiyan
Beers: Minerva PA, Cucapa Miel, Tempus Clasica, Tempus DM, Calavera MIS, Minerva Stout, St
Peters, Calavera APA, Tempus Dor, Jack, Patricia, Cucapa CH, 7 Barrios, Alebrije, Ramuri, Corona,
Indio, Victoria, Leon, Bohemia, Modelo, Heineken, Negra Modelo, Pacifico, Tecate, Bohemia Osc,
Guiness, Carolus, Sol, Noche Buena
The first scree plot shows that our variance is explained by one dimension, which is what I will
analyze. This strong first eigenvalue (high variance in the first group) tells us that people generally
agree on the sorting of the beers.
Nabil Salehiyan
The RV map between judges shows me the correlation between the different participants. The
diagonal consists of all 1’s which shows that each participant does not differ from themselves-
obviously. As for ones that do differ, judges C51 and C1 did not rate the beers the same whereas
judge C5 and C2 interpret the beers very similarly.
This RV factor map shows the difference between men and women on this sorting task. There seems
to be a general separation between genders. It looks as if men are more negative on the first
dimension and women are more on the positive end of this dimension. The confidence intervals tell
me that the mean of men and women do not differ and that we cannot reliably have group assignment.
The significance of the second dimension is only 5% so I will only consider the first dimension to
explain our data.
The compromised scree plot shows us the dimension for the products. We also see a big drop off for
variance with this scree plot, which tells us that these participants generally agree on the sorting of
Nabil Salehiyan
the beers. The compromise is the average of the sorting of products. For this we can look at two
dimensions to explain the variance.
The relationships between the rows (products) for beers looks like how we would expect it to after
looking at the scree plot. We see big chunks of correlation in the diagonals which tells us that there
was a general agreement for sorting these beers. It tells us that all these participants saw a positive
correlation between the first 15 beers compared to the first 15 beers, a negative correlation between
the last 15 beers to the first 15 beers, a negative correlation between the first 15 beers and the last 15
beers, and a positive correlation between the last 15 beers and the last 15 beers.
Nabil Salehiyan
This compromise factor map shows us the sub groupings between these products. There seems to be
3 clear sub-groups between the Mexican beers that likely is due to style of beer such as lager, light,
and dark. These groupings are separated on both dimensions. The second dimension only has 10%
significance whereas the first has 34%, so the separation of these groups on the first dimension
carries more weight.
The partial factor score compromise map shows us the difference between men and women’s ratings
on these products. Since the lines that are connected to the beers seem to all be the same length, we
can assume that for most of these beers- men and women rated them the same. Although there are a
couple beers in which the lines are different lengths such as Guiness (the one closest to the middle of
the graph), which shows a difference between women and men’s ratings on this beer.
Summary
We can make an inference that men and women generally do not rate Mexican beers much
differently from one another. From additional research it seems like the beers separated on the factor
score map differed in things like lightness and bitterness. The beers on the negative side of the first
dimension seem to be clean, crisp beers (except for Guiness which is a dark beer). The subgroup on
the positive end of the second dimension seem to be lighter/bright beers and the subgroup on the
negative end of the second dimension seem to be darker, more bitter beers. In general, judging by
the partial factor score map, men and women rated these beers the same except for a couple
exceptions such as Guiness and Heineken. The scree plots and heat maps also tell us that there was a
general agreement on the ratings of Mexican beers from these participants, with a high drop off in
eigenvalue variance and large correlation chunks in the row heat map
Nabil Salehiyan